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Abstract 

Image segmentation has been a very active research topic in image 
analysis area. Currently, most of the image segmentation algorithms 
are designed based on the idea that images are partitioned into a set 
of regions preserving homogeneous intra-regions and inhomogeneous 
inter-regions. However, human visual intuition does not always follow 
this pattern. A new image segmentation method named Visual-Hint 
Boundary to Segment (VHBS) is introduced, which is more consis- 
tent with human perceptions. VHBS abides by two visual hint rules 
based on human perceptions: (i) the global scale boundaries tend to 
be the real boundaries of the objects; (ii) two adjacent regions with 
quite different colors or textures tend to result in the real boundaries 
between them. It has been demonstrated by experiments that, com- 
pared with traditional image segmentation method, VHBS has better 
performance and also preserves higher computational efficiency 



1 Introduction 

Image segmentation is a vast topic in image analysis. In this chapter, we 
present a low-level image segmentation method, which has been proposed 
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to segment images in a way that agrees with human perceptions. In recent 
years, Most of the image segmentation algorithms are designed based on an 
idea that partitions the images into a set of regions preserving homogeneous 
intra-regions and inhomogeneous inter-regions. By this idea, these methods 
segment images in classification or clustering manner. However, human visual 
intuition does not always follow this manner. Our goal of this research is to 
define a low-level image segmentation algorithm which is consistent with 
human visual perceptions. 

The proposed new image segmentation method is called Visual-hint Bound- 
ary to Segment (VHBS). VHBS abides by two visual hint rules based on hu- 
man perceptions: (i) the global scale boundaries tend to be the real bound- 
aries of the objects; (ii) two adjacent regions with quite different colors or 
textures tend to result the real boundaries between them. Compared with 
other unsupervised segmentation methods, the outputs of VHBS are more 
consistent to the human perceptions. Beside, reducing complexity is another 
objective of VHBS since high performance segmentation methods usually are 
computationally intensive. Therefore, chaos and non-chaos concepts are in- 
troduced in VHBS to prevent algorithm going down to details of pixel level. 
It guarantees that segmentation process stays at a coarse level and keeps the 
computational efficiency 

VHBS is composed by two phases, Entropy-driven Hybrid Segmenta- 
tion (EDHS) and Hierarchical Probability Segmentation (HPS.) VHBS starts 
from EDHS, which produces a set of initial segments by combining local re- 
gions and boundaries. These local regions and boundaries are generated 
by a top-down decomposition process and initial segments are formed by a 
bottom-up composition process. The top-down decomposition process recur- 
sively decomposes the given images into small partitions by setting a stopping 
condition for each branch of decomposition. We set an entropy measurement 
as the stopping condition since smaller entropy of local partitions implies 
lower disorder in the local partitions. To preserve the computational effi- 
ciency, we set up a size threshold of the partitions to prevent the decompo- 
sition going down to pixel level. Based on this threshold, local partitions are 
grouped into two types, chaos if the size of a partition is less than the thresh- 
old and non-chaos otherwise. Local regions and boundaries are computed in 
local partitions. Each local region is described by a vector called feature 
description and the local boundaries are weighted by the probabilities. To 
calculate the probabilities, we design two scale filter, f\ and f'2, which are 
based on the two visual hints (i) and (ii) respectively. The boundaries be- 
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tween two adjacent regions are weighted by the product of fi and ji. A 
bottom-up composition process is followed and combines these local regions 
and boundaries to form a set of the initial segments, S = (so, ■ ■ ■ , s n ). 

The second phase of VHBS is Hierarchical Probability Segmentation 
(HPS,) which constructs a Probability Binary Tree (PBT) based on these 
initial segments S. PBT presents the hierarchy segments based on boundary 
probabilities between these initial segments, which forms the leaves of PBT. 
The root represents the original images and the intern nodes of PBT are 
the segments combined by their children. Links are labeled by the boundary 
probabilities. PBT can be visualized in number of segments or even provides 
the local details. The difference compared with the methods based on MST 
such as [201 ESI [21] is that these methods generate the tree structure based 
on the similarities between pixels. Whereas, our method generate the tree 
structure based on probabilities between regions. It makes the algorithm in- 
sensitive to the noise and it greatly reduces the computational complexity. A 
similar approach is proposed by [TJ . Compared with this approach, VHBS is 
more efficient since VHBS prevents the decomposition process going down to 
pixel level by setting a chaos threshold. The novel aspects of VHBS include: 

1. Visual-Hint: Algorithm abides by two visual hint rules which force the 
outputs of VHBS are more consistent to human perceptions; 

2. feature Detection: VHBS outputs a set of feature descriptors, which 
describe the features for each segment; 

3. computational Efficiency: VHBS has high computational efficiency 
since the algorithm does not go down to pixel level; 

4. hybrid algorithm: VHBS combines edge-, region-, cluster- and graph- 
based techniques. 

2 Relative work 

Image segmentation is one of major research areas in image analysis and is 
already explored for many years. Regularly, segmentation methods partition 
the given images into a set of segments, where a segment contains a set of pix- 
els that preserve high similarity within a segment and maximize differences 
between different segments. Some examples of classical image segmentation 
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algorithms are k-means clustering histogram threshold, region growth 
and watershed transformation. These methods are efficient and easy to un- 
derstand but with obvious weaknesses which become barriers for applications. 
These weaknesses include sensitivity of image noise and textures; improperly 
merging two disjoint areas with the similar gray values by histogram thresh- 
old methods; improper initial condition resulting in incorrect outputs and 
tending to produce excessive over-segmentation by watershed transforma- 
tion methods [27] . All these examples demonstrate that image segmentation 
is an extensive and difficult research area. In recent years, numerous im- 
age segmentation methods have been proposed and greatly overcome those 
weaknesses. Commonly, segmentation algorithms fall into one or more than 
one of the following categories: edge-based, region-based, cluster-based and 
graph-based. 

The idea of edge-based segmentation methods is straightforward. Con- 
tours and segments highly correlate each other. The closed contours give the 
segments and segments automatically produce the boundaries. Edge-based 
segmentation methods rely on contours located in images and then these 
contours produce the boundaries of the segments. Therefore, much research 
has focused on contour detection. The classical approaches to detect the 
edges are to look for the discontinuities of brightness such as in Canny Edge 
Detection [BJ. [US] demonstrates that these approaches by looking for dis- 
continuities of brightness are inadequate models for locating the boundaries 
in natural images. One reason is that texture is a common phenomenon 
in natural images and these textures produce some unwanted edges. An- 
other reason is that to locate segments in an image by edges requests closed 
boundaries. These approaches usually provide incontinuous contours, which 
is inadequate for locating the segments. In recent years, many high perfor- 
mance contour detections have been proposed such as (121 |36l [32], [551 133] . 
One category of contour detections is locating the boundary of an image by 
measuring the local features. To improve the edge detection performance 
on the natural images, some approaches consider one or combine more de- 
scriptors for each pixel in several feature channels over a different scales and 
orientations to locate boundaries. [36J proposes a learning schema approach 
that considers brightness, color and texture features at each local position 
and uses a classifier to combine these local features. Based on the research 
of [361 E3] combines the spectral component to form the so called globalized 
probability of boundary to improve the accuracy of the boundary detection. 
There are many boundary detection and segmentation methods which are 
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oriented energy approach [37j. Examples of these approaches are [5TJ HU] . 
To achieve high accuracy, these approaches usually combine more than one 
local feature channels. The computational complexity becomes a bottleneck 
if the application requests high computational efficiency Of course there are 
many other proposed boundary detection and segmentation algorithms based 
on rich texture analysis such as jH I2S1 E2] ■ However, highly accurate image 
contour and segment detection methods are computationally intensive pQ. 

To locate the segments from the contours, the contours must form closed 
regions. An example of such research is |45j by bridging the disconnecting 
contours or contours tracking to locate the regions. Another recent research is 
PQ , which can be divided into two phases: (i) Oriented Watershed Transform 
(OWT) produces a set of initial regions from a contour detection. Paper 
selects gPb proposed by [33] as the contour detection algorithm since this 
contour detector gives high accuracy by the benchmark of BSDB [35J; (ii) 
Ultrametric Contour Map (UCM) constructs the hierarchical segments. A 
tree is generated by a greedy graph-based region merging algorithm, where 
the leaves are those initial regions and the root is the entire images. The 
segmentation algorithm proposed by pQ has high accuracy segmentation per- 
formance. But the disadvantage is obvious. gPb is a expensive contour 
detection and gPb provides fine gradient initial regions. It can be proved 
that the time complexity of constructing hierarchical segments over such a 
fine gradient is also computationally intensive. Other examples of recent 
contour-segment researches are [221 123] . 

Typically, a region-based algorithm is combined with clustering tech- 
niques to assemble the sub-regions into final segments and numerous methods 
fall into these two schemas such as gHl EOl HDl H21 IIH I2H H3] • The common 
used techniques include region growth, split-merge, supervised and unsuper- 
vised classification. [21] proposes a region growth algorithm, called random 
walk segmentation, which is a multi-label, user interactive image segmenta- 
tion. This algorithm starts from a small number of seeds with user-predefined 
labels. Random walk algorithm can determine the probabilities by assum- 
ing a random walker starting at each unlabeled pixel that will first reach 
one of these user-predefined seeds. By this assumption, pixels are assigned 
to the label which is the greatest probability based on a random walker. 
Mean shift [13] is a well known density estimation cluster algorithm and 
been widely used for image segmentation and object tracking. Based on the 
domain probability distribution, the algorithm iteratively climbs the gradient 
to locate the nearest peak. [13] demonstrates that mean shift provides good 
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segmentation results and is suitable for real data analysis. But the quadratic 
computational complexity of the algorithm is a disadvantage and the choice 
of moving window size is not trivial. 

In recent years, much research has been built based on graph theoretic 
techniques. It has been demonstrated by HH [201 HS1 HS1 HH1 HS1 12U E21 
El EH1 EH El EH EH] that these approaches support image segmentation as 
well. As pointed out by [26], graph-based segmentation could be roughly 
divided into two groups. One is tree-structure segmentation and another is 
graph-cut segmentation. Assuming a 2D image as space P, both of these 
two approaches view P as the collection of a set of subgraphs < pi, . . . ,p n >, 
where each pi is an undivided partition, P = [Jpi and = Pif]Pj for all 
1 < i 7^ J ' <• n - Commonly, pi denotes a pixel of the images. Tree-structure 
[201 HS1 ESI UHl [HI [151 EJJ expresses the split-merge process in a hierarchical 
manner. The links between parents and children indicate the including rela- 
tionship and the nodes of the tree denote the pieces of subgraphs. Graph-cut 
[52"1 El ESI EIH El EH HZ] views each element of < p±, . . . , p n > as a vertex and 
the edges of the graph are defined by the similarities between these adjacent 
vertices. This process forms a weighted undirected graph G = (V, E) and 
relies the graph cutting to process the graph partition. 

A common tree-structure approach is minimum spanning tree (MST) |41j. 
[2"01 |2"T] propose an algorithm based on MST. That is using the local vari- 
ation of intensities to locate the proper granularity of the segments based 
on the so called Kruskal's minimum spanning tree (KMST). Another recent 
example of tree-structure approach is [16]. The purpose of this approach is to 
find the semantic coherence regions based on e-neighbor coherence segmen- 
tation criterion by a so called connected coherence tree algorithm (CCTA). 
Rather than generating tree based on the pixel similarities, pQ generate a tree 
structure based on the region similarities. Tree structure based on region sim- 
ilarities should provide better computational complexity than the structure 
based on the pixel similarities since \V\ is greatly reduced by replacing pixels 
by regions. 

Graph-cut approaches are also called spectral clustering. The basic idea 
of the graph-cut approach is partitioning G = (V, E) into disjoint subsets 
by removing the edges linking subsets. Among these approaches, the nor- 
malized cut (Ncut) [H] is widely used. Ncut proposed a minimization cut 
criterion which measures the cut cost as a fraction of the total edge connec- 
tion to all the nodes in the graph. This paper demonstrates that minimizing 
the normalized cut criterion is equivalent to solving a generalized eigenvector 
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system. Other recent examples of graph-cut approaches are [521 E] • Graph- 
cut approaches have been proved to be NP-complete problems and the com- 
putational complexity is expensive. These disadvantages become the main 
barriers for graph-cut methods. 

3 Entropy- driven Hybrid Segmentation 

Entropy-driven Hybrid Segmentation, EDHS, begins with a top-down de- 
composition from the original input images and finishes with a bottom-up 
composition. Top-down decomposition quarterly partitions a given image 
and correspondly produces a quadtree based on a stopping condition. EDHS 
uses an edge detector, such as Canny Detector [6], to locate the boundaries 
between the local regions in the leaves. These boundaries are weighted by 
the probabilities computed based on the two visual hint rules. 

Bottom-up composition recursively combines the local regions when the 
two adjacent local regions share a boundary with zero probability. This 
process forms the initial segments, S = (so, ■ ■ ■ , s n ), and a set of probabilities, 
Cb = { c i,j}i which describes the weights of the boundaries between each 
pair of the adjacent initial segments, where index i and j imply two initial 
segments Sj and Sj, which share a boundary valued by a real number Cy G 
[0, 1], < i ^ j < n. For each initial segment Sj, a feature vector, fdi =< 
v\, . . . ,v m >, is generated to describe this segment. The feature descriptor, 
such as the CF in BIRCH [17], summarizes the important features of each 
area (cluster.) Although the specific values used in feature descriptor may 
vary, in this chapter we assume < r, g,b >, where r, g and b are mean values 
of color channels of red, green and blue. 

3.1 Top-down decomposition 

Decomposition mechanism is a wildly used technique in hierarchical image 
segmentation [21 EB] • Our decomposition process recursively decomposes the 
images into four quadrants. The decomposition process is presented by an 
unbalance quadtree. The root represents the original image and nodes repre- 
sent the four partitions. A stopping condition is assigned for each branch of 
decomposition. Partition process is stopped when the desired stopping con- 
dition is reached. Figure [T] demonstrates an example of the data structures. 
We summarize the top-down decomposition as follows: 
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i Partitioning the images into small pieces reduces the information pre- 
sented in local images, which helps VHBS conquer the sub-problems at 
the local position; 

ii Decomposition provides the relative scale descriptor for the scale filter 
to calculate the probabilities of the boundaries. We will discuss the 
relative scale descriptor in section [3.2.1 

iii Divide and conquer schema potentially supports the parallel implemen- 
tation, which could greatly improve the computational efficiency 

To describe the decomposition, the dyadic rectangle [29] is introduced. A 
dyadic rectangle is the family of regions J S)t = {[i2 s , (i + 1)2 S — 1] x [j2 t , (j + 
1)2*-1],0 < i < n/2 s -l,0 < j < m/2*— 1}, forO < s < logn,0 < t < logm. 
The dyadic rectangle of I has some nice properties. Every dyadic rectangle is 
contained in exactly one "parent" dyadic rectangle, where the "child" dyadic 
rectangles are located inside of a "parent" dyadic rectangle. The area of 
"paren" is always an integer power of two times of "chil" dyadic rectangle. 
Mapping the images into Cartesian plane, dyadic rectangle provides a model 
to uniformly decompose images into sub-images recursively. 

Given an n x m I, assuming that n and m are the power of 2, the set 
of dyadic rectangles at levels (0,0) through (log n, logm) form a complete 
quadtree, whose root is the level (log n, logm) dyadic rectangle [0, n — 1] x 
[0, m — 1] . Each dyadic rectangle I Sjt with level 1 < s < log n, 1 < t < log m 
has four children that are dyadic rectangles at levels s — 1 and t — 1, which are 
four quadrants of the I Sjt . Suppose I s>t = [i2 s , (i + l)2 s — 1] x [j2\ (j + l)2<- 1], 
for < i < n2~ s ,0 < j < m2~*, then, the first quadrant of I s j is [(2i + 

1) n2"( s+1 ), (2i + 2)n2"( s+1 )-l] x [2jm2-( i+1 ), (2j + l)m2-(' +1 ) -1]; the second 
quadrant is [2m2-( s+1 ) , (2z+ l)n2~( s+1 ) - 1] x [2jm2-(* +1 ) , (2j + l)m2-( i+1 ) - 1] ; 
the third quadrant is [2m2~( s+1 ), (2i + l)n2-( s+1 )-l] x [(2j + l)m2-(' +1 ), (2j+ 

2) m2-( t+1 )-l] and the fourth quadrant is \(2i + l)n2-( s+1 \ (2i + 2)n2-( s+ V - 
1] x [(2j + l)m2-(' +1 ), (2j + 2)m2~(* +1 ) - 1]. 

3.1.1 Stopping Condition 

In information theory, entropy is a measure of the uncertainty associated with 
a random variable [TJj. We choose entropy [43J as the stopping condition 
for the top-down decomposition since entropy provides a measurement of 
disorder of a data set. Let ( denote the stopping condition for each branch 



8 




Figure 1: Top-down decomposition and the quadtree structure 

of the quadtree. If ( holds, then EDHS stops the partition process of this 
branch. By decreasing the size of images, the decomposition reduces the 
information presented in the local positions. Follows give the concept of 
segment set. Based on this concept, we define the entropy of images and 
K-Color Rule. 

Definition 1 (Segment set:). Given a partition P of the interval [a,b],a = 
jo < ji < • • • < jn — b, where a and b are minimum and maximum of feature 
values of a given image I, it gives a segment set X = {x±, . . . ,Xk}, where 
Xi is a set of pixels that all the pixels in X{ form a connected region and all 
the feature values of the pixels are located in interval \jh, jh+i] , < h < n. 
I = U Xi and (p = Xif] Xj for all < % ^ j < k . 

Definition 2 (Entropy :). Given a segment set X = {x\, . . . based on 
a partition P, then entropy of I to the base b is 

k 

H k (X) =-$>(*0 log^a*) (1) 
i=i 

where p denote the probability mass function of segment set X = {xi}, 1 < 
% < k. To make the analysis simple, assume the logarithm base is e. This 
gives 
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k 

H k (X) = - y^ y p(xj) In p(xi) 

i=l 



(2) 



Theorem 1. Let k be the number of segments of an image, supremum of 
H k (X) is strictly increasing function with respect to k and the supermum of 
H k (X) is ln(k). 

Proof. Let H k (pi, . . . ,p k ) denote the entropy of an image with segment set 
X = {xi, x 2 , ■ ■ ■ , x k } and P(X = Xi) = Pi. To show supremum of H k (X) is 
strictly increasing function respect to k, we need to show that sup H k (pi, . . . , p k ) < 
sup H k+1 (p[, . . . ,p' k=1 ) for any k E Z + . 
By [H] Theorem 2.6.4, we have 

1 11 

..,-) and H k+1 ( Pl ,...,p' k+1 ) < H k+i(-j—j, ■■ ■ , ^— y) 

In lk = ln(fc) 
fc+i j 

= - V In lk + 1 = \n(k + 1) 

k -\- 1 

i=i 

= \n(k + l)>\n(k) = H k (^-,...^) 

□ 

Definition 3 (K - Color Rule:). Using different colors for different segments 
in segment set X = {xi, . . . , x k }, if the image holds no more than k segments, 
which means image can be covered by k colors, we say condition 'K - Color 
Rule 7 (K-CR) is true; else, K-CR is false. 

Assume an m x n image I. If / is a one color (1-CR) image, then it is 
a zero entropy image by Theorem [!} Consider another case. Assume the 
image is too complicated that none of the segments holds more than one 
pixel. This case gives the maximum entropy, ln(mn) (k yields mn.) Then, 
the range of H k (X) for a m x n I is [0,ln(mn)]. The larger the entropy is, 
the more information is contained in the images. 



H k (pi, . . . ,p k ) < H k ( 7 



1 

1 



Hk[ V-k' ^k 



H, 



k+i\ 



H, 



k+i\ 



k 1 



1 1 



•jfe + r 'k + v 

Then we have 

1 1 . 



k + r 'k + v 
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Based on this observation of Hk{X), we choose image entropy as the 
stopping condition for the top-down decomposition because Hk{X) is highly 
related to the number of segments, k denotes the number of segments and 
the range of k is [l,mn]. If a proper value of k is chosen for a given image, 
then ( yields to Hk{X) < ln(fc) by Theorem [I] That is, for a certain branch 
of the quadtree, decomposition approach partitions the given images until 
the local entropy is no larger than ln(k). 

3.1.2 K as An Algorithm Tuning Parameter 

The value of k impacts the depth of decomposition. A small value of k results 
a deep quadtree because ln(fc) is small. Small leaves do not contain too much 
information, which results few boundaries within the leaves. Thus k is a key 
issue since it decides the weights of the edge- and region/clustering-based 
segmentation techniques used in EDHS. In other words, k is a measurement 
that indicates the degree to which each technique is used. Figure [2] demon- 
strates that k can be viewed as a sliding block ranging from 1 to mn. If k 
is close to 1, EDHS is closer to a region/cluster-based segmentation method 
since few boundaries are detected in the leaves. The weight of edge-based 
technique increases as long as the value of k becomes large. 




1 k mn 

k=l: pure region/cluster-based segmentation k=mn: pure edge-based segmentation 



Figure 2: Sliding block k and entropy measurement ln(k) 

Suppose k — 1, then the stopping condition ( yields Hk(X) < ln(l) = 0. 
To meet this stopping condition, the decomposition process goes down to the 
pixel level if the neighbor pixels are inhomogeneous. Then, EDHS is a pure 
region/cluster-based segmentation since there is no necessary to detect the 
boundaries for the one color images. 
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Suppose k = mn, then the stopping condition ( yields H k (X) < ln(mn) 
. By Theorem [TJ no decomposition approach is processed since ( holds for 
an m x n image by Theorem [T] Then, EDHS is a pure edge-based segmenta- 
tion since no decomposition approach is employed. EDHS just runs an edge 
detector locating the boundaries to form the local regions. 

For an m x n image, the possible values of k range from 1 to mn. Are 
all these integers from 1 to mn valid for kl The answer is no. Let us take a 
close look at the cases when k = 1 and k = 2. 

Points, lines and regions are three essential elements in a two-dimensional 
image plane. We are looking for a value of k which can efficiently recognize 
the lines and regions (we treat a point as noise if this point is inhomogeneous 
with its neighbors.) Keep in mind that the aim of decomposition is to reduce 
the disorder. It suggests that k should be a small integer. When k — 1, as 
discussed above, it forces the leaves to be one color. EDHS yields a pure 
region/cluster-based segmentation. Previous algorithms of this type have 
proved to be computationally expensive |46j . 

Consider k = 2. The decomposition approach continually partitions the 
image until the local entropy is less than hi(2), which tends to force the 
leaves holding no more than two colors. Assuming a line passing through 
the sub-images, to recognize this line, one of the local regions of the leaves 
needs to be this line or part of a line. It makes the size of leaves quite small 
and forces the decomposition process to go down to pixel level. Under this 
circumstance, the time and space complexities are quite expensive. Another 
fatal drawback is that the small size of leaves makes the EDHS sensitive to 
noise. Even a pixel, which is inhomogeneous with its neighbors, could cause 
invalid recognition around this pixel area. 

k > 3 is a good choice because it can efficiently recognize the lines and 
regions in 2D plane. An example is shown in Figure [3] (a). Top-down de- 
composition goes down to pixel level to locate the curve if set k < 3. But for 
k > 3, no decomposition is needed since the entropy of Figure [3] (a) must be 
less than ln(3) by Theorem [T] It suggests that EDHS is stable and reliable 
when k > 3. 

There is an extreme (the worst) case that we need to consider. This 
case is shown in Figure [3] (b). Multiple lines pass through one point, say p. 
Define a closed ball B(p, e), where p is the center and e > is the radius of 
the ball. No matter how small e is, B(p, e) contains 2Ni + 1 segments divided 
by Ni lines. In other words, partitioning does not help reduce the number 
of segments around the area B(p,e). Therefore, decomposing images into 
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(a) One line 



(b) Multiple lines 



Figure 3: Lines in images 

small pieces does not decrease entropy inside B{p,e). To handle this case, 
we introduce 'chaos' leaves. 

As shown in Figure [3] (b), entropy does not decrease inside B(p,e) along 
with decomposition. To solve this problem, we introduce a threshold [TJ which 
is the smallest size of the leaves. If the size of partition is less than I, the 
top-down decomposition does not continue even though the desired ( has not 
been reached. If this case happens, we call these leaves chaos. 

3.1.3 Approximate Image Entropy 

To calculate the entropy defined by Definition |2j we should know the prob- 
ability distribution Pr(X = Xj) = Pi, where X = ( set of 
segments of I. In most cases, we have no prior knowledge of the distribution 
of the segments for the given images. In other words, we are not able to di- 
rectly compute the image entropy defined by Definition |2j Definition [4] gives 
an alternative calculation called approximate image entropy, which does not 
require any prior knowledge of the distribution of the segments but provides 
an approximate entropy value. 

Definition 4 (approximate entropy:). Given a partition P of the interval 
[a,b], a — jo < ji < . . . < j n — b, where a and b are minimum and maxi- 
mum of feature values of a given image I, then approximate entropy H(V) 
is defined as 

n 

H(V) = -J2^og bPt (3) 

i=i 
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where p denotes the probability mass function of the feature value set V = 
{v\, . . . ,v n }. Each Vi denotes a collection of pixels whose feature values are 
located in and pi = Pr(V = Vj), 1 < i < n. After setting the 

logarithm base as e, H{V) yields 

n 

H{V) = ~Y,vMVi (4) 

8=1 

Theorem 2. Given an I and a partition P , H(V) is less or equal then H(X). 

Proof. Given an I and a partition P = [jo, ji, ■ ■ ■ ,jn], where a = jo < ji < 
■ ■ ■ < jn = b, a and b are minimum and maximum of pixel feature values of /. 
Let k is the number of segments defined in Definition [2j By the Definition of 
|2j k must be greater or equal to n. There are two cases need to be considered. 
One is k = n and another is k > n. 

Case I: if k — n, by the Definition [2] and Definition|4j p(xi) = p(vi), which 
induces H(X) = H{V). 

Case II: if k > n, it implies that there must exist at least two segments 
which locate at the same partition interval. Without loss of generality, as- 
suming H(X) and H(V) are defined over the partition P, where H(X) with 
two segments Xi and Xj, 1 < % ^ j < k, both feature values of Xi and al- 
locate in the interval [jh-i,jh], 1 < h < n. If we can prove H(X) > H(V), 
then the theorem can be proved by repeating following proof arbitrary times. 
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By Definition [2] and [4] respectively, 

H(X) = -(p(a?i) lnp(xi) + . . . + p(x h >) lap(x h >) + p{x h ») \np(x h ») + . . . + p(x n ) \np(x n )) 

H(V) = -(p(xi) \np(xi) + . . . + p{x h ) lnp(x h ) + . . . + p{x n ) hxp(x n )) 

where p(x h ) = p{x h >) +p(x h »),0 < p(x h ) < 1 and < p(x h i),p(x h ») < p(x h ) 

To prove H{X) > H(V), we consider H(X) - H(V). 

H(X) - H(V) = p(x h ) hxp{x h ) - (p(x h/ ) lnp(x h/ ) + p(x h ») lnp(x h »)) 

Let p(xh) = y,p(xh>) = x, therefore p(xh>>) = y — x, where < x < y < 1 

H(X) — H(V) = ylny — (xlnx + (y — x) ln(y — x)) 

= \ny y - (\nx x + \n(y - x) {y ~ x) ) 

= lny y -lnx x {y-x) iy - x) 
Let f{x,y)=yy-x x {y-x)^ 
If we can prove f(x, y) > for all < x < y < 1 
then H(X) - H(V) = \ny y - \nx x (y - x) { y - x) > 

since ln(x) is an increasing function. 

f(x,y)=y» - x %y-x)^ 
,{y-x) y 



y y -x x 



(y-x) 



y y -(—T(y-x) y 
y-x 

y(-i / X \x ( V ~ X - 



y y (l 



y-x y 



Let „ = _, then/^,,) = - = - 

Notice that a x < (1 + a) y since a is a positive number and < x < y < 1 

a x 

Then, 1 — ^— - — — is positive. It implies f(x,y) > because y v > 

□ 

Assume an / and k-CR, is true. It implies that Hk(X) < ln(fc) by Theorem 
[T| By Theorem [2j H(V) must be equal or less than Hk(X). Therefore, It 
induces the a logical chain, truth of A;-CR =>- Hk(X) < ln(k) =>- H(V) < 
ln(fc). Both Hk(X) < ln(k) and H(V) < ln(/c) are necessary but not sufficient 
conditions for the truth of A;-CR. 
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3.1.4 Noise Segments 

The term noise of an image usually refers to unwanted color or texture in- 
formation. We do not count a small drop of ink on an A4 paper as a valid 
segment. In most circumstances, it would be considered as noise. How to 
distinguish those valid and invalid segments is an important issue. 

Definition 5 (Dominant and Noise segments:). Let p denote the probabil- 
ity mass function of segment set X = {xi, . . . ,Xk} in I. Given a thresh- 
old tnoise G [0, 1], Xi is a noise segment if Pr(X = Xi) = p% < t noise and 
^2 Pi<tnoise Pi is greatly less than J2 Pl >t noi3e Pi , 1 < z < A;. Other segments are 
called dominant segments. 

If the segments are small enough and the total area of those segments 
occupies a small portion of a given image, we call those segments noise seg- 
ments. The first requirement of noise segment is understandable because the 
noise segments should be small. The reason of defining the second condition 
is to avoid the cases that the images are totally composed by small pieces. 

The value of k of K-CR in Definition |3] refers the number of dominant 
segments. By Theorem [I] and |2j the supremum of H(V) for this given image 
is no longer ln(fc). The noise supremum of H(V) should be slightly larger 
than ln(k). Assuming the noise redundancy be e, then redundancy stopping 
condition, ( r , yields H(V) < In (A;) + e. 

Consider dividing segments into two groups, noise and dominant seg- 
ments. By Definition [4j H(V) yields as follows: 

H(V) - ^ p(xj) hip(xi) = - ^ p(xi)\np(xi)- ^ p(xi)\np(xi) 

Given an image /, let a be the total portion of dominant segments. Then 

a = J2 p{Xl)>tnmse P(^) and the rest area > 1 - a = E P ( Xi )< tnoise P( x i^ is the 
portion of noise segments. By Definition |5j a >> (1 — a). Let k and k' 
be the number of dominant and noise segments respectively. After applying 
Theorem 2.6.4 [13], we get the noise supermum of H(V) as follows. 

H(V) = - ^2 p{xi) In p(xi)- ^ P( x i) Inp(xj) < -(a ln(^) + (l-a) ln(- 

The noise redundancy e = — (aln(|) + (1 — a)ln(^p^)) — ln(/c). The redun- 
dancy stopping condition, ( r , yields H(V) < — (aln(|) + (1 — a) ln(-jEr)). 
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Following gives an example to compute the noise redundancy. Suppose 
a 3-CR application (k = 3), setting a = 0.98 and k' = 3. The redundancy 
stopping condition for 3- CR yields — (aln(|)+(l— a) ln(^=2)) = 1.1968, which 
is slight greatly than ln(k) = 1.0986. Noise redundancy e = — (aln(|) + (1 — 
a) ln(^)) -ln(fc) = 0.0981. 

We summarize the top-down decomposition by Algorithm [T] and demon- 
strate some examples of the top-down decomposition in Figure [4] by varying 
different k values, where I = 3, a = 0.998 and k! = 3. 



input : /: An image 

output: T qu d. A decomposition quadtree 

l if size of I < I then 



8 
9 
10 
11 
12 
13 

14 



/ / current / is chaos 

Create a chaos leaf for / and generate a feature descriptor for /. 
else 

if H(V) > -(ahi(§ ) + (1 - a) ln(^)) then 

Partition I into four partitions: quadl, quad2, quad3, 
quad4; 

Append quadl, quad2, quad3 and quadi as children of / in 
the Tg U d, 

Topdowndecomposition(gwa<il); 
Topdowndecomposition(g-ua<i2) ; 
Topdowndecomposition(g-ua<i3) ; 
Topdowndecomposition(g-ua<i4) ; 
else 

Locate the local regions by detecting the boundaries within 

I; 

Create a non-chaos leaf and generate a feature descriptor 
for each local region; 



Algorithm 1: Topdowndecomposition 



3.2 Bottom-up Composition 

Bottom-up composition stands at a kernel position of VHBS since this pro- 
cess combines the local regions at the leaves of the quadtree to form the 
initial segment set S = (sq, . . . ,s n ). It also calculates the probabilities of 
the boundaries between these initial segments. At the same time, bottom-up 
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Figure 4: Quarter decomposition by different stopping conditions 

composition process generates the feature descriptors for each initial segment 
by combining the local region feature descriptors. 

The probabilities of the boundaries between these initial segments are 
computed by two filters, which are designed based on the two visual hint 
rules (i) and (ii) separately The first one called scale filter, fx, abides by rule 
(i). The probabilities are measured by the length of the boundaries. Longer 
boundaries result higher probabilities. The second one called similarity filter, 
/2, abides by rule (ii). The probabilities of the boundaries are measured 
by the differences of two adjacent regions. Larger different features of two 
adjacent regions result higher probability boundaries. The finial weights of 
the boundaries are the trade-off of two filters by taking the products of these 
two filters. If the probability of the boundary between two local regions is 
zero, these two local regions are combined together. 

3.2.1 Scale Filter 

Scale filter is defined based on the visual hint (i): the global scale boundaries 
tend to be the real boundaries of the objects. It suggests that these bound- 
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aries caused by the local texture are not likely to be the boundaries of our 
interesting objects because the objects with large size are more likely to be 
our interesting objects. To measure the relative length of each boundary, we 
use the sizes of the decomposition partitions in which the objects are fully 
located. These local scale boundaries are not likely to extend to a number of 
partitions since the length of these boundaries are short. By this observation, 
we define the scale filter f\ based on the sizes of the partitions. 

Scale filter fi is a function which calculates the confidence of the boundary 
based on the scale observations. The input parameter of f\ is the relative 
scale descriptor s, which is ratio between the sizes of the local partitions and 
the original images. The relative scale descriptor s is the measurement of 
the relative scale of the boundaries. We assume the sizes of the images are 
the length of the longer sides. An example of calculating scale descriptor s 
is shown in Figure [5j The boundaries inside the marked partition have the 
relative scale descriptor s, which is defined as: 

s = z/L (5) 

where z is the size of a partition and L is the size of a original image. 




Figure 5: The relative scale descriptor 

Based on visual hint (i), fi(s) must be a strictly increasing function on 
domain [0, 1] and the range of f\(s) locates in the interval [0, 1]. If s is small, 
it suggests that the confidence of the boundary should be small since the 
boundaries are just located in a small area. If s is close to one, it suggests 
that the boundaries have high confidence. The gradient of fi(s) is decreasing. 
This is because human perception is not linearly dependent on the relative 
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scale descriptor. For the same difference, human perception is more sensitive 
when both of them are short rather than both of them are long. Therefore 
we define /i(s) as follows: 

M^TT^I- 1 ( 6 ) 

where fli is the scale damping coefficient and s is the relative scale descriptor. 
Figure M gives the fi(s) with different scale damping coefficients. 




Figure 6: Scale filter fi(s) with different scale damping coefficients. 
3.2.2 Similarity Filter 

Compared these regions with similar colors or textures, human perception 
is more impressed by regions with quite different features. Visual hint (ii) 
suggests that two adjacent regions with different colors or textures tend to 
produce the high confidence of boundaries between them. Based on this 
observation, we defined a similarity filter f'2 to filter out the boundary signals 
which pass through similar regions. 

Similarity filter f'2 is a function which calculates the confidence of the 
boundaries based on the similarity measurements between two adjacent re- 
gions. Examples of similarity measurements are Dice, Jaccard, Cosine, Over- 
lap. The similarity measurement is a real number in interval [0,1]. The 
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higher the value, the more similar two regions are. Similarity = 1 implies 
that they are absolutely the same and zero means they are totally different. 
Based on the visual hint (ii), small similarity measurement should result in 
high confidence boundary. Let x denote the similarity measurement of two 
adjacent regions. Then /^(z) is a strictly decreasing function over domain 
[0, 1] and the range of f'2{x) is inside [0, 1]. Human perception is not linear 
relationship with the similarity measures. Human perception is sensitive to 
the regions when these regions have obvious different colors or textures. For 
example, similarity measure 0.1 and 0.2 are not a big difference for human 
visual because both of them are obviously different. This fact suggests that 
the gradient of /^(^O is decreasing over the domain [0,1]. Then we define 
/2(V) as follows to satisfy the requirements above. 

am = ( 1+e - fe .-.) - Wirfs) (7) 

where fti is the similarity damping coefficient and x is the similarity measure- 
ment between two adjacent regions defined as x = similar (fdi, fd 2 ) G [0, 1]. 
fdi and fd 2 are feature descriptors of the local regions. Figure [7] gives the 
curves of f2{x) with respect to different similarity damping coefficients. 
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Figure 7: Similarity filter ^(s) with different similarity damping coefficients. 
We implemented VHBS by using f2(x) given by equation [6| We found 
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that the boundary signals are over-damped by the similarity filter /^O^)- The 
algorithm assigns low confidence values for boundaries with global scales that 
preserved similar feature descriptors. Human visual is also sensitive to these 
sorts of boundaries. To avoid these cases, we redesign the similarity filter by 
considering the relative scale descriptor as well. Similarity filter is redefined 
as feix, s), which outputs high confidence weights when either parameter s or 
x is close to one. We also introduce a threshold similarity t. If the similarity 
measurement is higher than this threshold, algorithm sets the confidence as 
0, which means that there are no boundaries between these two regions if 
human visual cannot tell the difference of these two adjacent regions. 

f (i+F§^ iix<t 
Mx,8)=l where y = a( i ^-l)(£gj.) (8) 

[0 if x > t 

where 02 is the similarity damping coefficient, a is amplitude modulation 
and x is the similarity measurement between two feature descriptors. Figure 
[8] demonstrates f2(x, s) when /3 2 = 10 and a = 20. 




Figure 8: Similarity filter f2{x, s) with /?2 = 10 and a = 20. 
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3.2.3 Partition Combination 

The bottom-up composition starts from the very bottom leaves. Composition 
process iteratively combines partitions from the next lower layer and this 
process continues until reaching the root of the quadtree. 

For the leaves of the quadtree, each boundary is marked by a confidence 
value, cnf, which is given by formula fi(s) ■ f2(x, s), where /i(s) and f2(x, s) 
are scale and similarity filters defined by equations [6] and [8] respectively. 
The relative scale descriptor, s, is computed by equation [5] and x is the 
similarity of two adjacent local segments. Figure [9] demonstrates the function 
cn f = fi{ s ) ' /^(^j s) where with f}\ = 8, (3 2 = 10 and a = 20. 




q 1 similarily measursment, a 

scale descriplor, s 



Figure 9: /i(s) • f 2 (x, s) with /3 X = 8, 2 = 10 and a = 20. 

For these non-chaos leaves, the contours which form the closed areas and 
the borders of the leaves form the boundaries of local regions. For these chaos 
leaves, the boundaries only refer to the leaf borders. Each running time of 
bottom-up combination, four leaves are combined together to the next lower 



layer. Figure 10 shows that there are several possible cases during combining 



four leaves together. 

i No interconnection happens during the combination; 
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ii A new segment is formed by connecting several local regions which 
locate in different leaves; 

iii The boundaries of leaves happen to be the boundaries of segments. 




Figure 10: Three cases of partition combination. 



For the case (i) such as region A shown in Figure 10, cnf is calculated 
when it is in leaf 1 and there is no necessary to recalculate cnf during the 
combination. But for the case (ii) such as region B, region B is connected by 
four local regions. Each cnf is calculated separately. But the cnf region B 
needs to be recalculated after four local regions combined together since the 
new combined segment is located in a large partition and the relative scale 
descriptor is increased. Besides, a feature descriptor for region B is generated 
based on feature descriptors for each local region. The third case is that the 
boundaries of partitions happen to be the boundaries of the regions. One 



example is region C shown in Figure 10 During the combination process, 



algorithm also calculates cnf of the leaf boundaries. After combination hits 
the root of the quadtree, the process generates the initial segment set S = 
(so> • • • j s n), the boundary confidence set Cb = {qj}, where indicates the 
boundary probability between segments Sj and Sj, < i ^ j < n and the 
feature descriptor set FD = (fdo, . . . , fd n ). Follows is the Algorithm [2] of 
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iterative bottom-up composition. 



input : T qud : A decomposition quadtree 




id: root of T qu d 


output: S = (sq, . . . , s n ): the initial segment set 




Cb = { c i,j}' the boundary confidence set 




FD = (fd , . . . , fd n ): the feature descriptor set 


1 node-(— read a node of T qu d by id; 


2 if node has children then 


3 


Read the four children of node as, idl,id2,id3,id4; 


4 


quadl-<— Bottomupcomposition(idl); 


5 


quad2-(— Bottomupcomposition(i(i2) ; 


6 


quad3-(— Bottomupcomposition(io?3) ; 


7 


quad4-(— Bottomupcomposition(i(i4) ; 


8 


/ /Combine partitions. Recompute cnf = fi(s) ■ f2(x, s) if 




needed 


9 


img«— combine(quadl,quad2,quad3,quad4); 


io else 


n 


/ /This node is a leaf of T qud 


12 


img<<— computelocalcnf(node); //cnf = fi(s) ■ f 2 (x,s) 


13 


return img; 



Algorithm 2: Bottomupcomposition 



4 Hierarchical Probability Segmentation 

Hierarchical segmentation is a widely used technique for image segmenta- 
tion. Regular hierarchical segmentation is modeled in layer-built structures. 
Compared with the regular hierarchical structures, Hierarchical Probability 
Segmentation, HPS, presents the hierarchical segmentation by a Probability 
Binary Tree (PBT,) where the links are weighted by the confidence values, 
cnf E [0, 1]. The root represents an image. Nodes represent segments and 
the children of a node are the sub-segments of this node. Initial segments 

5 — (sq, . . . , s n ) compose the leaves of PBT. Since PBT is generated in greedy 
manner, higher level nodes always have higher probabilities than the lower 
level nodes. One can visualize the PBT in arbitrary number of segments. Of 
course, this number is less than the number of the initial segments. 
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4.1 Probability Binary Tree (PBT) 

Definition 6 (Probability Binary Tree (PBT):). Let uq denote the root of 
a PBT and uq represents the original images I. Nodes of PBT denote the 
segments and links represent the relationship of inclusion. Assume nodes n i; 
n n, n i2 and links In, h 2 , where nn and n i2 are children of linked by In 
and U2 respectively. Hi, nn, ni 2 , hi, hi preserve the following properties: 

i Let DS = {{n\ x , n\ 2 ), (n n , nf 2 ), ■ ■ ■} denote the set of all the possible 
pairs of sub- segments of rij. Function g{n a ,n J i2 ) gives the cnf of seg- 
ments n\ x and n{ 2 . Assume (na,^) = sngmaxg(DS) and In. There- 
fore lii are weighted by ginn^-i)', 

ii for any element of DS, (n^, n{ 2 ), ni = n\ x U n J i2 and (j) = n a D n\ 2 . 

Definition [6] recursively gives the definition of PBT, which has the follow- 
ing properties: 

i Every PBT node (except root) is contained in exactly one parent node; 

ii every PBT node (except the leaves) is spanned by two child nodes; 

iii a number of pairs of nodes span n^. These pairs are candidates to be 
the children of and each pair is labeled with the probability of these 
two nodes, {cnf = g{< n a ,n° l2 >),cnf\ = g(< nj^njr, >), . . .}. PBT 
chooses the pairs < n\ x , n\ 2 >, which have the highest cnfj to span the 
node rii. The links In and l i2 are weighted by cnfj] 

iv assume a node rij with a link h pointed in from its parent and the links 
hi, h2 pointing out to its children. Weights of In, l i2 must be no larger 
than the weight of h] 

v if two nodes (segments) overlap, one of them must be a child of the 
other. 

By presenting the segmentation in PBT, the images are recursively par- 
titioned in two segments with the highest probability among all the possible 



pair of segments. Figure 11 gives an example of the PBT. 

Let Cb = {cij} denote the set of boundary probabilities. Cjj G C& rep- 
resents the boundary probability between initial segments Sj and Sj, where 
Cij 6 [0, 1], < i ^ j < n. HPS constructs PBT in bottom-up manner, 
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(i) The nodes labeled by numbers denote the leaves, which are the initial segments. 

(ii) The nodes labeled by letters denote the inner nodes, which present the segments combined by initial segments. 

(iii) The length and width of the links represent the weights of the links, which are the probabilities of the segments. 



Figure 11: Probability Binary Tree. 



which means leaves are first created and the root is the last node created. 
To generate a PBT in greedy schema, Cj, is sorted in ascending order. Let 
T pbt be a PBT and C' b be the ascending order sequence of C b . Algorithm [3] 
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describes to generate a T pbt . 





input : C' b . sorted Cb in ascending order 




output: Tjm' a probability binary tree, PBT 


1 


while C' b is not empty do 


2 




Cjj read the first element of C' b and remove it from C b , 


3 




i read the index of segment Sj from c^-; 


4 




j -r- read the index of segment Sj from c^-; 


r 
o 




if Sj is not exist in T pb i then 


5 




Create a node Sj in T pbt ; 


( 




else 


8 




^ Sj ^— read the Sj from T p u\ 


9 




if Sj is not exist in T p u then 


10 




Create a node Sj in T p u] 


11 




else 


12 




^ Sj «— read the Sj from T pW ; 


13 




Create a new node n new , which is the parent of Sj and Sj] 


14 




Create the links from n new to Sj and from n new to Sj weighted 






by Cjj; 


15 




Replace index i and j in current C' b by the index of n new and 






remove the duplicate elements in C b , 



Algorithm 3: Generating a probability binary tree. 



4.2 Visualization of the Segmentation 

Generally, there are two ways to visualize the segments. One is threshold- 
based visualization and another is number-based visualization. As discussed 
in previous section, the root of the PBT represents the original images. For 
the other nodes, the more shallow the positions are, the more coarse-gradient 
the segments are. For example, visualizing image in segments a and b shown 



in Figure [TT]is combining initial segments 1, 2 and 3 together to form segment 
a and combining initial segments 4 and 5 together to form segment b. 

Suppose a threshold, t visua i e [0,1], is selected for visualization. By the 
properties of the PBT, the weights of the links are the probabilities of the seg- 
ments. The weighs are decreasing as long as the depths are increasing. Given 
a t V i SUCIi i, threshold-based visualization only displays the segments whose link 
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weights are greater than the given t visua i. 

The number-based visualization displays a certain number of segments. 
Let n V i Sua i denote the number of visualization segments. The implementa- 
tion of number-based visualization is trivial. Algorithm sorts the nodes in 
descending order with respect to the link weights and picks the first n visua[ 
number of nodes to display. 

5 Algorithm Complexity Analysis 

Since the algorithm is divided into two stages, EDHS and HPS, we discuss 
the computational complexity of them separately. 

5.1 EDHS Computational Complexity 

Assume the depth of the quadtree generated by top-down decomposition is d 
and the depth of the root is zero. The maximum d is (min( |_log 2 n\ , |_log 2 m\ ) — 
[log 2 l~\ ) , where nxmis the size of original images and I is the chaos thresh- 
old. Depending on the different images and the chosen stopping condition (, 
decomposition process generates an unbalance quadtree with depth of d. To 
analysis the complexity of the decomposition, we assume the worst cases that 
the images are fully decomposed. It implies that depth of all the leaves is 
d = (min( |_log 2 n\ , |_log 2 rn\ ) — |~log 2 f|). At the ith level of the quadtree, there 
are 4 J numbers of nodes and the size of each node is ™. Then the running 
time of computing the stopping condition ( of the ith depth is 4*^5 = mn 
and the total running time of decomposition is dmn. Commonly, the time 
complexity of an edge detector is 0{mn) such as Canny Edge Detection [B]. 
Plus the time complexity of generating the feature descriptors 0(mn). The 
running time of top-down decomposition is (d + 2)mn, which gives the time 
complexity of top-down decomposition 0(mn). 

The combination process starts from the leaves to calculate the boundary 
confidence by cnf = fi(s) ■ / 2 (a;, s), which gives the running time for 
each leaf (leaf size is p^pr) and total running time is = mn since 

there are 4 d numbers of leaves totally. At the ith level of the quadtree, 
the composition process combines the four quadrants into one, which gives 
the running time m + n + SS- and total running time of the ith depth is 
4*(m + n + p ^) = mn + 4* (m + n) . Then the total running time of bottom- 
up composition is (d + l)mn + ^-^(m + n). It can be proved when d is 
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large enough, term of - — ^ n> dominates the running time. It gives the 
time complexity 0(A d (m + n)). Then the time complexity of the EDHS is 
0(4 d (m + n)), where d is the depth of the quadtree and mn is the size of the 
input images. 

5.2 HPS Computational Complexity 

To make the analysis simple, we assume maximum d is log2n (under the 
worst situation.) It suggests that the maximum number leaves is A l ° 92n = n 2 . 
This implies the worst running complexity case is that one pixel is one local 
segment. At pixel level, PHS considers four pixel neighbors as adjacent local 
segments. They are left, right, upper and below pixels. Then the maximum 
size of Cb is 2mn. PHS first sorts Cb to C' b , which gives the time complexity 
0(mnlog(mn)). Considering the algorithm to generate the T p bt, the total 
running time is Xlti'l^™ — i) — Zimn) 2 — mn. This gives time complexity 
0((mn) 2 ). Compare with the complexity of EDHS, generating T^t is more 
expensive part. In real applications, the number of elements in Cb is far less 
than 2mn since the algorithm does not go down to pixel level by setting the 
size threshold of the partitions. The running time also highly depends on the 
input images because the number of local segments function depends on the 
complexity of the image itself. For the test running experiments of image set 
|35| . the average number of elements of Cb is around 3000 or less. Then the 
practical time complexity of algorithm is far less than 0((mn) 2 ). 

6 Experimental Results 

In this section, we demonstrate experiments of VHBS by tuning different 
parameters and evaluate the results by comparing with outputs of Normal- 
ized Cut [H] and KMST [201 EI] based on the same test set of Berkeley 
Segmentation dataset (BSDB) [35J. 

6.1 Weighted Boundary Evaluations 

As discussed in previous section, the visualization of boundary probability set 
Cb results in the weighted boundaries of the initial segments S = (s , . . . , s n ). 
Some examples are demonstrated in Figure 10. We use the benchmark 
provided by BSDB, which uses Precision-Recall curves [1H] to evaluate the 
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boundary detection. Precision is defined as the number of true positives over 
the number of elements retrieved by the detection, and Recall is defined as 
the number of elements retrieved by detection divided by the total num- 
ber of existing relevant elements. Precision measure can be viewed as the 
correctness and Recall measure can be viewed as the completeness. 

Since the purpose of our algorithm is to locate the segments, algorithm 
only marks the contours that form the boundaries of the regions. Incontinu- 
ities or unclosed contours are ignored. These ignored contours include three 
portions. One is the contours within the chaos leaves. The second is the con- 
tours in non-chaos leaves that are Incontinuities or unclosed. These bound- 
aries between two regions with similarity measure higher than the threshold 
t (in equation [I]) are also ignored. These suggest that Precision- Recall curves 
mainly located at the left side of the PR graphs. 

The first set of experiments is based on the observation of different values 
of k. We discussed the value of k in section 3.1.2 Recall one of the purposes 
of decomposition is to reduce the information contained in leaves. From 
this point of view, small integer is preferred for k since small values of k 
result more strict stopping condition. On the other hand, we need to avoid 
too many chaos leaves since chaos means that the algorithm has failed to 
recognize these regions. According to the arguments above, the ideal value 
of k should be as small as possible but large enough to lower the number 
of chaos leaves. We call this value optimization point. Obviously, different 
images have different optimization points. Figure 12 shows the boundary 
detection Precision-Recall based on k over the BSDB test set. As shown 
in Figure, algorithm has better Precision-Recall performances as long as k 
increasing since larger values of k give less number of chaos leaves. But after 
some points, there are no improvements because numbers of chaos leaves are 
zero. 

Based the observation of k, we used a program to automatically locate the 
optimization points for each image. The program is trivial. It pre-generates 
the decomposition quadtree and selects the value of k which is the smallest 
integer in range [3, mn] but holding zero number of chaos leaves. Then the 
rest experiments are generated based on the optimization points for each 
image. 

We also demonstrate the experiments based on damping coefficient fli 



of scale filter (/i |6J) in Figure 13 (ii), the experiments based on damping 
coefficient of similarity filter (f 2 [8]) in Figure 13 (iii), the experiments 
based on amplitude modulation a of similarity filter (/ 2 |8]) in Figure 13 (i) 
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Figure 12: Boundary evaluation based on k. 



and the experiments based on similarity threshold t of similarity filter (j'2 
[sj) in Figure 13 (iv). As demonstrated, different values of these parameters 
slightly shift the Precision-Recall curves. Based on these experiments, we 
choose a = 1, /?i = 8, /?2 = 3 and t = 0.994 for the remaining experiments. 
Figure 14 provides some examples of boundary detections. 



6.2 Visualization of the Segments 

We demonstrate some examples of the natural images based on different 
number of segments in Figure [l5j The visualizations of the segments are 
generated based on top-down manner of PBT. Since PBT is constructed 
based on the probabilities of the segments, the segments located at the higher 
level of PBT suggest higher possibilities of the segments. If visualization PBT 
in small number of segments, it gives the general outline of the images since 
only the segments with high possibilities are visualized. The more details are 
provided when visualization goes down to the depth of PBT. 
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(i) Boundary evaluation based on amplitude modulation 



(ii) Boundary evaluation based on scale damping coefficient 




(iii) Boundary evaluation based on (iv) Boundary evaluation based on 

similarity damping coefficient similarity threshold t 



Figure 13: Boundary evaluation based on a, /3 2 and t. 
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Figure 14: Examples of the weighted boundaries. 

6.3 Evaluation Experiments by Tuning Parameters 

Rather than giving the subjective running experiments of the algorithm, 
the segmentation results need to be compared with other existed methods 
quantitatively. One usually used segmentation evaluation is supervised eval- 
uation [7j, which compares the results of segmentation against the manually- 
segmented reference images. The disadvantage of such methods is that the 
quality of segmentation is inherently subjective. Then, there is a requirement 
to evaluate the image segmentations unsubjectively. Such methods are called 
unsupervised evaluation [M] methods such as [531 El E] • 

To evaluate the results of segmentation quantitatively and objectively, 
[25] proposed four criteria: (i) the characteristics should be homogeneous 
within the segments; (ii) the characteristics should be quite different between 
the adjacent segments; (iii) shape of the segments should be simple and with- 
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Figure 15: Examples of the segmentation results by VHBS. 

out holes inside; (iv) boundaries between segments need to be smooth and 
continues, not ragged. For our experiments, we choose Q(I) [3], H r (I), Hi(I) 
and E (53] as evaluators. Q(I) and H r (I) measure the intra-region unifor- 
mity, which is described as the criteria (i). Hi(I) measures the inter-region 
uniformity, which is the criteria (ii). E is defined as E = H r (I) + Hi(I), 
which combine the criteria (i) and criteria (ii). Table [l] gives the details of 
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the evaluators used in this chapter. The disadvantage of unsupervised evalu- 
ations is that these evaluation criteria might not appropriate for the natural 
images since the perception of human segmentation is based on the semantic 
understanding. It might result different conclusions by human perception 



Figure 16 gives the experimental evaluations based on damping coefficient 



Pi of scale filter fi [6j damping coefficient fa , amplitude modulation a and 
similarity threshold t of similarity filter ji [8j 



: damning coefficient oi fi 



(i) 




be1s2. damping coeffii 



00 




Ipha, Amplitude modulation of filter f2 



(iii) 




0.934 0.986 0.988 0.99 0.992 0.994 0.996 0.9 
similarity threshold t 



(iv) 



Figure 16: Experimental evaluations based on fa, fa, a and t. 



Figure 16 demonstrates that the evaluations of Q(I), H r (I), H\{I) and E 



are not varied much by different damping coefficients (3i, 02 and amplitude 
modulation a. It suggests that Q(I), H r (I), Hi(I) and E are not sensitive 

(iv) presents the same information 



to parameters of 0i, 02 and a. Figure 16 
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Table 1: The unsupervised evaluators used in this chapter. 



Evaluator 


Description 


Formula 


Q{I) 


Intra-region 
evaluator based 
on color error 


0(1) = Y R \ et + ( R{s ^) 2 } 


H r {I) P] 


Intra-region 
evaluator based 
on entropy 


m) = Er =1 (f)^w 

H(x l ) = -j: m&vr ^\o g (^) 


Hi(i) im 


Inter-region 
evaluator based 
on entropy 


^(/) = -Ef =1 |M|) 


E [53] 


Composite eval- 
uator 


E = H r (I) + H^I) 


/: the segmented image 
NM: the size of the image 

R: the number of regions in the segmented image 
Sf the area of pixels of the ith segment 
Sj: the area of pixels of the image I 

R(Si): the number of segments having the area of pixels equal to Si 
ef. the color error in RGB space defined as 

4 = E 7e{r>ff , 6} Ep^cw - 

where C 7 (p) is the value of component 7 of pixel p and C 7 (JQ) = peX g 1 — 
V/ 1 : the set of all possible values associated with feature p in segment i 
Lj(m): the number of pixels in ith segment that have a value of m for feature p 
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as Figure 13 (iv) demonstrated that similarity threshold t — 1 obviously has 
worse performance of Q(I) than other similarity threshold. This is the reason 
we do not set t = 1 in the experiments. 



Figure 17 shows the evaluations of Q(I), H r (I), Hi(I) and E based on 
parameter of k, where N nc and N c are the number of non-chaos leaves and 
chaos leaves. The information shown in Figure 17 (i) demonstrates that both 



the numbers of chaos and non-chaos leaves are decreasing as long as k in- 
creasing. Notice that the number of non-chaos leaves is slightly increasing at 
very left side of the graph is because that there are too many areas transfer- 
ring from chaos to non-chaos. Based on Figure 17 (i), we can make a judge 



that the average optimization point locates around k = 25 for the whole test 
set of BSDB since N c is close to zero when k is about 25. The evaluation 
Q(I) shown in Figure 17 (ii) strongly supports this estimation. Obviously, 



the optimization points are different when images are different. 





15 20 25 30 35 



(i) Number of chaos and non-chaos 
leaves based on k 



(ii) Evaluations based on k 



Figure 17: Experimental evaluations based on k. 



6.4 Comparison with Other Segmentation Methods 

In this section, we compare VHBS with Ncut [H] and KMST [201 EI] over 
the entire test set of Berkeley Segmentation Dataset and Benchmark [35J. 
The source codes of Ncut and KMST are got from the author's websites. To 
fairly compare these three algorithms, we tune the parameters to output the 
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same number of segments of each image in the test set. Figure 18 provides 
the evaluations of Q(I), H r (I), Hi(I) and E based on number of segments. 




Figure 18: VHBS compares with Ncut 05 and KMST [201 EH- 

Figure [18] demonstrates that our algorithm gives the best performance of 
Q(I) and H r (I) almost over all the number of segments and Normalized Cut 
has the best performance of Hi(I). Let us take a close look at the evaluator 
H[(I). As [S3] points out, Hi(I) favors very few large segments and many 
small segments. In other words, Segmentation with very few large segments 
and many small segments gives high evaluation Hi(I). 

It is expected that VHBS performs poorly over the evaluator Hi(I) be- 
cause Hi(I) contradicts to the mechanism of scale filter f\, where scale filter 
favors large areas. Scale filter /x [6] makes VHBS prefer to select large area seg- 
ments, which is more consistent with human visual perception. Meanwhile, 
similarity filter f% M favors the segments preserving high uniformity within 
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the segments because / 2 marks the boundaries with high weights when two 
adjacent regions are quite different. Based on the discussion above, it is not 
hard to understand that VHBS performs well Q(I) and H r (I), but not Hi(I) 
and E since Hi(I) and E is penalized by number of small area segments. 

7 Conclusion 

Our contribution lies in proposing a new low-level image segmentation algo- 
rithm, VHBS, which obeys two visual hint rules. Unlike most unsupervised 
segmentation methods, which are based on the clustering techniques, VHBS 
is based on the human perceptions since fi and fi are designed based on two 
visual hint rules and somehow contradict clustering ideas. The evaluations of 
experiments demonstrate that VHBS has high performance over the natural 
images. VHBS still preserves high efficiency because VHBS does not go down 
to the pixel level by setting the entropy and chaos thresholds as the stopping 
condition of the image decompositions. Rather than outputting the segments 
of the given images, at the same time, VHBS also provides the feature de- 
scriptors, which are statistic summarization for each segment. To improve 
the performance, one of our future works is to construct the algorithm in 
learning schema to get the optimized parameters by a learning process. 
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